ComStreamClust: a Communicative Multi-Agent Approach to Text Clustering in Streaming Data
نویسندگان
چکیده
Topic detection is the task of determining and tracking hot topics in social media. Twitter arguably most popular platform for people to share their ideas with others about different issues. One such prevalent issue COVID-19 pandemic. Detecting on these kinds issues would help governments healthcare companies deal this phenomenon. In paper, we propose a novel, multi-agent, communicative clustering approach, so-called ComStreamClust sub-topics inside broader topic, e.g., FA CUP. The proposed approach parallelizable, can simultaneously handle several data-point. LaBSE sentence embedding used measure semantic similarity between two tweets. has been evaluated by metrics as keyword precision, recall, topic recall. Based recall number keywords, obtains superior results when compared existing methods.
منابع مشابه
A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملEfficient streaming text clustering
Clustering data streams has been a new research topic, recently emerged from many real data mining applications, and has attracted a lot of research attention. However, there is little work on clustering high-dimensional streaming text data. This paper combines an efficient online spherical k-means (OSKM) algorithm with an existing scalable clustering strategy to achieve fast and adaptive clust...
متن کاملa new approach to credibility premium for zero-inflated poisson models for panel data
هدف اصلی از این تحقیق به دست آوردن و مقایسه حق بیمه باورمندی در مدل های شمارشی گزارش نشده برای داده های طولی می باشد. در این تحقیق حق بیمه های پبش گویی بر اساس توابع ضرر مربع خطا و نمایی محاسبه شده و با هم مقایسه می شود. تمایل به گرفتن پاداش و جایزه یکی از دلایل مهم برای گزارش ندادن تصادفات می باشد و افراد برای استفاده از تخفیف اغلب از گزارش تصادفات با هزینه پائین خودداری می کنند، در این تحقیق ...
15 صفحه اولA Multi-Agent Approach to Arabic Handwritten Text Segmentation
The segmentation of individual words into characters is a vital process in handwritten character recognition systems. In this paper, a novel approach is proposed to segment handwritten Arabic text (words). We consider the “Naskh” font style. The segmentation algorithm employs seven agents in order to detect regions where segmentation is illegal. Feature points (end points) are extracted from th...
متن کاملA Graph-Based Clustering Approach to Identify Cell Populations in Single-Cell RNA Sequencing Data
Introduction: The emergence of single-cell RNA-sequencing (scRNA-seq) technology has provided new information about the structure of cells, and provided data with very high resolution of the expression of different genes for each cell at a single time. One of the main uses of scRNA-seq is data clustering based on expressed genes, which sometimes leads to the detection of rare cell populations. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Annals of Data Science
سال: 2022
ISSN: ['2198-5804', '2198-5812']
DOI: https://doi.org/10.1007/s40745-022-00426-4